Midterm exams

This is a "closed book" examination - in particular, you are not to use any resources outside of this notebook (except possibly pen and paper). You may consult help from within the notebook using ? but not any online references. You should turn wireless off or set your laptop in "Airplane" mode prior to taking the exam.

You have 2 hours to complete the exam.


In [38]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

Q1 (10 points).

Given the 2 matrices

A = np.array([[1,2,3],[4,5,6]])
B = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])

Perform matrix multiplication of A and B using the following methods:

  1. Using nested for loops without the dot function (4 points)
  2. Using numpy (2 points)
  3. Using R (start the first line of a new cell with %%R). You should pass in the A and B matrices defined in Python for full marks, but partial credit will be given if you redefine them in R (4 points)

In [49]:
import numpy as np

In [40]:
A = np.array([[1,2,3],[4,5,6]])
B = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])

In [41]:
m, n = A.shape
n, p = B.shape
C = np.zeros((m, p))
for i in range(m):
    for j in range(p):
        for k in range(n):
            C[i,j] += A[i,k] * B[k, j]
C


Out[41]:
array([[  38.,   44.,   50.,   56.],
       [  83.,   98.,  113.,  128.]])

In [42]:
A @ B


Out[42]:
array([[ 38,  44,  50,  56],
       [ 83,  98, 113, 128]])

In [51]:
%load_ext rpy2.ipython


The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython

In [43]:
%R -iA,B A %*% B


Out[43]:
array([[  38.,   44.,   50.,   56.],
       [  83.,   98.,  113.,  128.]])

In [44]:
%R -o iris

Q2 (10 points)

Read the data/iris.csv data set into a Pandas DataFrame, and answer the following questions:

  • Find the mean, min and max values of all four measurements (sepal.length, sepal.width, petal.length, petal.width) for each species
  • Find the average values of each measurement for rows where the petal.length is less than the sepal.width`

In [50]:
import pandas as pd

In [47]:
df = pd.read_csv('data/iris.csv')
df.groupby('Species').agg(['mean', 'min', 'max'])


Out[47]:
Sepal.Length Sepal.Width Petal.Length Petal.Width
mean min max mean min max mean min max mean min max
Species
setosa 5.006 4.3 5.8 3.428 2.3 4.4 1.462 1.0 1.9 0.246 0.1 0.6
versicolor 5.936 4.9 7.0 2.770 2.0 3.4 4.260 3.0 5.1 1.326 1.0 1.8
virginica 6.588 4.9 7.9 2.974 2.2 3.8 5.552 4.5 6.9 2.026 1.4 2.5

In [48]:
df[df['Petal.Length'] < df['Sepal.Width']].mean()


Out[48]:
Sepal.Length    5.006
Sepal.Width     3.428
Petal.Length    1.462
Petal.Width     0.246
dtype: float64

Q3 (10 points)

Find the longest sequence of repeated letters (e.g. 'AAA') in the string below. Print 1) the length, 2) the index of the starting location, 3) the actual sequence. If there are ties, print the last sequence found. You can assume that only the letters A, C, T and G are found in the string.

TGTAGTCCATGCGGAATTCCACAGGGGCTCTGGGGACAGATTCGGACCTTTCTGTCAACGCCAATCATGGAGGTAGTGTGAGGTATAAATTTGGTCGGCGTAGGTCAAGAAAACCCACCTGCGCTGCTGTACGACACATGGCCGAGGCTTCAAGGGCATTCCACGAAGAGGCTCATGGCAACGCCTCTCGAAAGCTGGCGCTCAGGAAGGTACGATCACCCTCGAAATCAAAGATTTCATCTGAAATAAAAGTTAGTACGCCACTTTAGGGTATCGAGTACTTACCCATTTATAACGGAGGCTGAGCGAACGCTTGGCTGATGAAAAAACAACACTCGGTATAAACGGCGATTTCCACTGATCCAGGTAAAGCATGTTTGTGGATAGCAAGGGCAAGTAGTATGCAGCGAGTTTCGTGACAGTATAGCTCGACATGTATATCTCTGTGGGCGCATTTGGATGCTGTATACTGTAGAAGCAGTATATTCCCTGATGACCGAACTTACTACAAGTTGTTGTCTCGACAGGTAGTACGTGTGATCTGTGTCTGAGACCTGCAACTGGTGCGCATTGAAACTTCGTACATAAACCTACCGACTTCACCGTTTCGGCGTCGGCTTGTAACTGGAGAGTGTTGTTGCGTCATGGTCGATTGAGGATTTGGCCTAAATGTAGCGCGTATACACTGCATTATTAGCGGCTTCGAGGAACATGTAATGGGCGAGGACAGAGAATTGTATGAGATTCAAACTGCCAGGTTTTATGGCGGACCCCTGCTCCCATTGTAATCGACCGGCGGCTGGGGTACGCCCGCACGAGGGTATCGGTAGTATATCTAGCTAAGCTCCGGTGTATGCTGTTGAGACACCATTCATGCGCAAAGCCCCACCGTGCACGCATGCGATGATAAATAAGGATGACTATGGCTTACAGAGATCTTTTTCAGGGGCGTCTTGCAATAATGGTTGATAAATGTGTTTTGCCGAATCAACTGCGCGGC

In [52]:
import re

In [53]:
s = "TGTAGTCCATGCGGAATTCCACAGGGGCTCTGGGGACAGATTCGGACCTTTCTGTCAACGCCAATCATGGAGGTAGTGTGAGGTATAAATTTGGTCGGCGTAGGTCAAGAAAACCCACCTGCGCTGCTGTACGACACATGGCCGAGGCTTCAAGGGCATTCCACGAAGAGGCTCATGGCAACGCCTCTCGAAAGCTGGCGCTCAGGAAGGTACGATCACCCTCGAAATCAAAGATTTCATCTGAAATAAAAGTTAGTACGCCACTTTAGGGTATCGAGTACTTACCCATTTATAACGGAGGCTGAGCGAACGCTTGGCTGATGAAAAAACAACACTCGGTATAAACGGCGATTTCCACTGATCCAGGTAAAGCATGTTTGTGGATAGCAAGGGCAAGTAGTATGCAGCGAGTTTCGTGACAGTATAGCTCGACATGTATATCTCTGTGGGCGCATTTGGATGCTGTATACTGTAGAAGCAGTATATTCCCTGATGACCGAACTTACTACAAGTTGTTGTCTCGACAGGTAGTACGTGTGATCTGTGTCTGAGACCTGCAACTGGTGCGCATTGAAACTTCGTACATAAACCTACCGACTTCACCGTTTCGGCGTCGGCTTGTAACTGGAGAGTGTTGTTGCGTCATGGTCGATTGAGGATTTGGCCTAAATGTAGCGCGTATACACTGCATTATTAGCGGCTTCGAGGAACATGTAATGGGCGAGGACAGAGAATTGTATGAGATTCAAACTGCCAGGTTTTATGGCGGACCCCTGCTCCCATTGTAATCGACCGGCGGCTGGGGTACGCCCGCACGAGGGTATCGGTAGTATATCTAGCTAAGCTCCGGTGTATGCTGTTGAGACACCATTCATGCGCAAAGCCCCACCGTGCACGCATGCGATGATAAATAAGGATGACTATGGCTTACAGAGATCTTTTTCAGGGGCGTCTTGCAATAATGGTTGATAAATGTGTTTTGCCGAATCAACTGCGCGGC"

In [97]:
current = s[0]
n = 0
idx = None

current = s[0]
count = 1
for i, ch in enumerate(s[1:], 1):
    if ch == current:
        count += 1
    else:
        if count >= n:
            n = count
            idx = i
        count = 1
        current = ch
        
idx -= n           
print(n, idx, s[idx:(idx+n)])


6 323 AAAAAA

In [90]:
n = 0
idx = None

for m in re.finditer(r'(.)(\1+)', s):
    x = m.group(2)
    if len(x) > n:
        n = len(x)
        idx = m.start()

n += 1
print(n, idx, s[idx:(idx+n)])


6 323 AAAAAA

In [88]:
n = 0
idx = None

for m in re.finditer(r'(A+|C+|T+|G+)', s):
    x = m.group(1)
    if len(x) > n:
        n = len(x)
        idx = m.start()

print(n, idx, s[idx:(idx+n)])


6 323 AAAAAA

Q4 (10 points)

Euclid's algorithm for finding the greatest common divisor of two numbers is

gcd(a, 0) = a
gcd(a, b) = gcd(b, a modulo b)
  1. Write a function to find the greatest common divisor in Python (4 poinst)
  2. What is the greatest common divisor of 17384 and 1928? (1 point)
  3. Write a function to calculate the least common multiple (4 points)
  4. What is the least common multiple of 17384 and 1928? (1 point)

Note:

  • The greatest common divisor of two or more integers is the largest positive integer that is a divisor of both numbers
  • The least common multiple of two numbers is the smallest number (not zero) that is a multiple of both.

In [98]:
def gcd(a, b):
    if b == 0:
        return a
    else:
        return gcd(b, a % b)

In [99]:
gcd(17384, 1928)


Out[99]:
8

In [104]:
def lcm(a, b):
    return (a*b) // gcd(a, b)

In [105]:
lcm(17384, 1928)


Out[105]:
4189544

Q5 (10 points)

Write a function to flatten a list of lists using

  1. For loops (2 points)
  2. List comprehensions (4 points)
  3. The reduce higher-order function (4 points)

For example,

flatten([[1,2], [3,4,5],[6,7,8,9]])

should return

[1,2,3,4,5,6,7,8,9]

In [106]:
def flatten1(list_of_lists):
    xs = []
    for alist in list_of_lists:
        for item in alist:
            xs.append(item)
    return xs

In [111]:
def flatten2(list_of_lists):
    return [item for alist in list_of_lists for item in alist]

In [133]:
from functools import reduce

In [135]:
def flatten3(list_of_lists):
    return list(reduce(lambda a, b: a + b, list_of_lists, []))

In [ ]:
xs = [[1,2], [3,4,5],[6,7,8,9]]

In [124]:
flatten1(xs)


Out[124]:
[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [125]:
flatten2(xs)


Out[125]:
[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [136]:
flatten3(xs)


Out[136]:
[1, 2, 3, 4, 5, 6, 7, 8, 9]